Solving mushroom classification problem from https://github.com/pbiecek/InterpretableMachineLearning2020/issues/5
I will use logistic regression as model.
import numpy as np
import pandas as pd
from sklearn import preprocessing
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import lime
import lime.lime_tabular
Load the data
data = pd.read_csv("dataset_24_mushroom.csv")
#remove aphostrophes from values
for col in data.columns:
data[col] = data[col].str.replace("'", "")
data.head()
| cap-shape | cap-surface | cap-color | bruises%3F | odor | gill-attachment | gill-spacing | gill-size | gill-color | stalk-shape | ... | stalk-color-above-ring | stalk-color-below-ring | veil-type | veil-color | ring-number | ring-type | spore-print-color | population | habitat | class | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | x | s | n | t | p | f | c | n | k | e | ... | w | w | p | w | o | p | k | s | u | p |
| 1 | x | s | y | t | a | f | c | b | k | e | ... | w | w | p | w | o | p | n | n | g | e |
| 2 | b | s | w | t | l | f | c | b | n | e | ... | w | w | p | w | o | p | n | n | m | e |
| 3 | x | y | w | t | p | f | c | n | n | e | ... | w | w | p | w | o | p | k | s | u | p |
| 4 | x | s | g | f | n | f | w | b | k | t | ... | w | w | p | w | o | e | n | a | g | e |
5 rows × 23 columns
Target class is in column "class", "p" means poissonous, "e" means eatable. Let's preprocess the data ie. encode classes.
#Lime requires numerical data, so use LabelEncoder
X = data.drop(columns=["class"])
class_enc = preprocessing.LabelEncoder().fit(data["class"])
y = class_enc.transform(data["class"])
encoders = {}
categorical_names = {}
for col in X.columns:
encoders[col] = preprocessing.LabelEncoder().fit(X[col])
categorical_names[X.columns.get_loc(col)] = encoders[col].classes_
X[col] = encoders[col].transform(X[col])
print(X.shape)
print(len(y))
print("Class label mapping: " + str(class_enc.classes_))
print("Poisonous mushrooms: %d" % np.sum(y))
(8124, 22) 8124 Class label mapping: ['e' 'p'] Poisonous mushrooms: 3916
Train and evaluate model
encoder = preprocessing.OneHotEncoder().fit(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=3234)
X_train_enc = encoder.transform(X_train)
classifier = RandomForestClassifier().fit(X_train_enc, y_train)
pred = classifier.predict(encoder.transform(X_test))
acc = np.mean(pred == y_test)
print("Accuracy: %f" % acc)
Accuracy: 1.000000
We have got perfect accuracy. Now let's check lime explanations.
explainer = lime.lime_tabular.LimeTabularExplainer(X_train.to_numpy(), feature_names=X.columns,
categorical_features=range(22), categorical_names=categorical_names,
class_names=['edible', 'poisonous'])
#Combine predictor with encoder to give it to lime
def predict(data):
return classifier.predict_proba(encoder.transform(data))
exp = explainer.explain_instance(X_test.to_numpy()[1], predict, num_features=10)
exp.show_in_notebook()
exp = explainer.explain_instance(X_test.to_numpy()[1234], predict, num_features=10)
exp.show_in_notebook()
exp = explainer.explain_instance(X_test.to_numpy()[934], predict, num_features=10)
exp.show_in_notebook()
As we see, in all explanations, almost all important features have the same sign and moderate values. Examples suggests that odor and gill size are deciding factors.
Let's train another model (Logistic Regression) and compare explanations.
logistic_classifier = LogisticRegression().fit(X_train_enc, y_train)
pred = logistic_classifier.predict(encoder.transform(X_test))
acc = np.mean(pred == y_test)
print("Accuracy: %f" % acc)
def predict_l(data):
return logistic_classifier.predict_proba(encoder.transform(data))
Accuracy: 1.000000
exp = explainer.explain_instance(X_test.to_numpy()[1], predict_l, num_features=10)
exp.show_in_notebook()
exp = explainer.explain_instance(X_test.to_numpy()[1234], predict_l, num_features=10)
exp.show_in_notebook()
exp = explainer.explain_instance(X_test.to_numpy()[934], predict_l, num_features=10)
exp.show_in_notebook()
As we see explanations are different between models. Both models achive 100% accuracy and are almost 100% sure in all example instances but logistic regression has much larger attributions. Signs of attributions and order of most important features are similar in both models.
As we know from analysys of this task on former classes, there are very few features that correctly classify almost all cases. Our explanations also suggests gill size and odor are most important features.
Attributions for logistic regression have higher values than the ones for random forest. It may mean that logistic regression model have higher local variability of results. Both models give the same answers on real data, so such behavior would mean that there are differences between them on fake data generated by lime.